NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Usability Analysis and Consequences of Testing Exploration of the Problem-Solving Measures–Computer-Adaptive Test

https://doi.org/10.3390/educsci15060680

King, Sophie Grace; Bostic, Jonathan David; May, Toni A; Stone, Gregory E (June 2025, Education Sciences)

Testing is a part of education around the world; however, there are concerns that consequences of testing is underexplored within current educational scholarship. Moreover, usability studies are rare within education. One aim of the present study was to explore the usability of a mathematics problem-solving test called the Problem Solving Measures–Computer-Adaptive Test (PSM-CAT) designed for grades six to eight students (ages 11–14). The second aim of this mixed-methods research was to unpack consequences of testing validity evidence related to the results and test interpretations, leveraging the voices of participants. A purposeful, representative sample of over 1000 students from rural, suburban, and urban districts across the USA were administered PSM-CAT followed by a survey. Approximately 100 of those students were interviewed following test administration. Findings indicated that (1) participants engaged in the PSM-CAT as desired and found it highly usable (e.g., most respondents were able to use and find the calculator and several students commented that they engaged with the test as desired) and (2) the benefits from testing largely outweighed any negative outcomes (e.g., 92% of students interviewed had positive attitudes towards the testing experiences), which in turn supports consequences from testing validity evidence for PSM-CAT. This study provides an example of a usability study for educational testing and builds upon previous calls for greater consequences of testing research.
more » « less
Free, publicly-accessible full text available June 1, 2026
Investigating Differences in Assessment Delivery Formats: An Illustration Study

May, Toni A; Stone, Gregory E; Bostic, Jonathan D; Folger, Timothy D; Sondergeld, Connor J (March 2025, Proceedings for the 52nd Annual Meeting of the Research Council on Mathematics Learning)

This study explored how mathematics problem-solving constructed-response tests compared in terms of item psychometrics when administered to eighth grade students in two different static formats: paper-pencil and computer-based. Quantitative results indicated similarly across all psychometric indices for the overall tests and at the item-level.
more » « less
Free, publicly-accessible full text available March 8, 2026
Synthesizing Bias and Fairness Evidence for the PSM-CAT

Bostic, Jonathan D; Matney, G; May, Toni; Koskey, Kristin; Stone, Gregory; Folger, Timothy (November 2024, Proceedings of the 46th Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education)
Kosko, Karl W; Caniglia, Joanne; Courtney, Scott A; Zolfaghari, Maryam; Morris, Grace A (Ed.)
Full Text Available
Mapping errors in problem-solving to mathematical practices

Koskey, Kristin; Yiyun, Fan; Folger, Timothy; Klein, Michael; Hanna, Casey; Yovanov, Cindy; Bostic, Jonathan; May, Toni; Matney, Gabriel; Stone, Gregory (November 2024, Proceedings of the 46th Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education)
Kosko, Karl W; Caniglia, Joanne; Courtney, Scott A; Zolfaghari, Maryam; Morris, Grace A (Ed.)
Full Text Available
COMPUTER ADAPTIVE MATHEMATICAL PROBLEM-SOLVING MEASURE: A BRIEF VALIDATION REPORT

Bostic, Jonathan; May, Toni; Matney, Gabriel; Koskey, Kristin; Stone, Gregory; Folger, Timothy (March 2024, Research Council on Mathematics Learning)
Kombe, Dennis; Wheeler, Ann (Ed.)
The purpose of this proceeding is to share a component to a validity argument for a new, computer adaptive mathematics Problem-Solving Measure that is designed for grades six through eight (PSM 6-8). The PSM is a single test, which uses computer adaptive features to measure students’ performance using instructional standards. It is intended to measure students’ problem-solving performance related to instructional standards.
more » « less
Full Text Available
Validity and Test-Length Reduction Strategies for Complex Assessments

Kruse, Lance; Stone, Gregory; May, Toni; Bostic, Jonathan (November 2023, Journal of applied measurement)
Smith, Richard (Ed.)
Lengthy standardized assessments decrease instructional time while increasing concerns about student cognitive fatigue. This study presents a methodological approach for item reduction within a complex assessment setting using the Problem Solving Measure for Grade 6 (PSM6). Five item-reduction methods were utilized to reduce the number of items on the PSM6, and each shortened instrument was evaluated through validity evidence for test content, internal structure, and relationships to other variables. The two quantitative methods (Rasch model and point-biserial) resulted in the best psychometrically performing shortened assessments but were not representative of all content subdomains, while the three qualitative (content preservation) methods resulted in poor psychometrically performing assessments that retained all subdomains. Specifically, the ten-item Rasch and ten-item point-biserial shortened tests demonstrated the overall strongest validity evidence, but future research is needed to explore the psychometric performance of these versions in a new independent sample and the necessity for subdomain representation. Implications for the study provide a methodological framework for researchers to use and reduce the length of existing instruments while identifying how the various reduction strategies may sacrifice different information from the original instrument. Practitioners are encouraged to carefully examine to what extent their reduced instrument aligns with their pre-determined criteria.
more » « less
Full Text Available
A modified depth of knowledge framework for word problems.

Bostic, Jonathan; Folger, Timothy; Koskey, Koskey; Matney, Gabriel; May, Toni; Stone, Gregory (October 2023, PMENA)
Lamberg, Teruni; Moss, Diana (Ed.)
Depth-of-knowledge (DOK) is a means to communicate the cognitive demand of tasks and is often used to categorize assessment items. Webb’s (2002) framework has been applied across content areas. The aim of this two-phase iterative study was to modify Webb’s DOK framework for word problems. Through work with school partners, this iterative design-research based study provides supportive evidence for a modified DOK framework reflecting levels of complexity in word problems. The resulting modified DOK framework presents an opportunity for mathematics educators to reflect on various aspects of cognitive complexity.
more » « less
Full Text Available
Flip it: An exploratory (versus explanatory) sequential mixed methods design using Delphi and differential item functioning to evaluate item bias

https://doi.org/10.1016/j.metip.2023.100117

Koskey, Kristin L.K.; May, Toni A.; Fan, Yiyun “Kate”; Bright, Dara; Stone, Gregory; Matney, Gabriel; Bostic, Jonathan D. (November 2023, Methods in Psychology)

Full Text Available
Examining how using dichotomous and partial credit scoring models influence sixth‐grade mathematical problem‐solving assessment outcomes

https://doi.org/10.1111/ssm.12570

May, Toni A.; Koskey, Kristin L. K.; Bostic, Jonathan D.; Stone, Gregory E.; Kruse, Lance M.; Matney, Gabriel (February 2023, School Science and Mathematics)

Abstract Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle‐grades problem‐solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6.
more » « less

Search for: All records